AITopics

2509.10712

Country:

North America > United States (0.28)
Europe > United Kingdom > Scotland (0.16)

Genre: Research Report > New Finding (0.93)

Industry:

Information Technology (0.47)
Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Böther, Maximilian, Yao, Xiaozhe, Kerimoglu, Tolga, Klimovic, Ana

Mixtera: A Data Plane for Foundation Model Training

arXiv.org Artificial IntelligenceFeb-27-2025

State-of-the-art large language and vision models are trained over trillions of tokens that are aggregated from a large variety of sources. As training data collections grow, manually managing the samples becomes time-consuming, tedious, and prone to errors. Yet recent research shows that the data mixture and the order in which samples are visited during training can significantly influence model accuracy. We build and present Mixtera, a data plane for foundation model training that enables users to declaratively express which data samples should be used in which proportion and in which order during training. Mixtera is a centralized, read-only layer that is deployed on top of existing training data collections and can be declaratively queried. It operates independently of the filesystem structure and supports mixtures across arbitrary properties (e.g., language, source dataset) as well as dynamic adjustment of the mixture based on model feedback. We experimentally evaluate Mixtera and show that our implementation does not bottleneck training and scales to 256 GH200 superchips. We demonstrate how Mixtera supports recent advancements in mixing strategies by implementing the proposed Adaptive Data Optimization (ADO) algorithm in the system and evaluating its performance impact. We also explore the role of mixtures for vision-language models.

arxiv preprint, mixtera, proceedings, (12 more...)

2502.1979

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Virginia (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
(2 more...)

Genre: Research Report (0.83)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceFeb-16-2025

The Streaming Batch Model for Efficient and Fault-Tolerant Heterogeneous Execution

Luan, Frank Sifei, Mao, Ziming, Wang, Ron Yifeng, Lin, Charlotte, Kamsetty, Amog, Chen, Hao, Su, Cheng, Veeramani, Balaji, Lee, Scott, Cho, SangBin, Zinzow, Clark, Liang, Eric, Stoica, Ion, Wang, Stephanie

While ML model training and inference are both GPU-intensive, CPU-based data processing is often the bottleneck. Distributed data processing systems based on the batch or stream processing models assume homogeneous resource requirements. They excel at CPU-based computation but either under-utilize heterogeneous resources or impose high overheads on failure and reconfiguration. We introduce the streaming batch model, a hybrid of the two models that enables efficient and fault-tolerant heterogeneous execution. The key idea is to execute one partition at a time to allow lineage-based recovery with dynamic resource allocation. This enables memory-efficient pipelining across heterogeneous resources, similar to stream processing, but also offers the elasticity and fault tolerance properties of batch processing. We present Ray Data, an implementation of the streaming batch model that improves throughput on heterogeneous batch inference pipelines by 3--8$\times$ compared to traditional batch and stream processing systems. When training Stable Diffusion, Ray Data matches the throughput of single-node ML data loaders while additionally leveraging distributed heterogeneous clusters to further improve training throughput by 31%.

artificial intelligence, machine learning, natural language, (21 more...)

2501.12407

Country: North America > United States (0.46)

Genre: Research Report (0.40)

Industry:

Information Technology (0.55)
Energy > Oil & Gas (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Architecture (1.00)

arXiv.org Artificial IntelligenceNov-15-2024

BioNeMo Framework: a modular, high-performance library for AI model development in drug discovery

John, Peter St., Lin, Dejun, Binder, Polina, Greaves, Malcolm, Shah, Vega, John, John St., Lange, Adrian, Hsu, Patrick, Illango, Rajesh, Ramanathan, Arvind, Anandkumar, Anima, Brookes, David H, Busia, Akosua, Mahajan, Abhishaike, Malina, Stephen, Prasad, Neha, Sinai, Sam, Edwards, Lindsay, Gaudelet, Thomas, Regep, Cristian, Steinegger, Martin, Rost, Burkhard, Brace, Alexander, Hippe, Kyle, Naef, Luca, Kamata, Keisuke, Armstrong, George, Boyd, Kevin, Cao, Zhonglin, Chou, Han-Yi, Chu, Simon, Costa, Allan dos Santos, Darabi, Sajad, Dawson, Eric, Didi, Kieran, Fu, Cong, Geiger, Mario, Gill, Michelle, Hsu, Darren, Kaushik, Gagan, Korshunova, Maria, Kothen-Hill, Steven, Lee, Youhan, Liu, Meng, Livne, Micha, McClure, Zachary, Mitchell, Jonathan, Moradzadeh, Alireza, Mosafi, Ohad, Nashed, Youssef, Paliwal, Saee, Peng, Yuxing, Rabhi, Sara, Ramezanghorbani, Farhad, Reidenbach, Danny, Ricketts, Camir, Roland, Brian, Shah, Kushal, Shimko, Tyler, Sirelkhatim, Hassan, Srinivasan, Savitha, Stern, Abraham C, Toczydlowska, Dorota, Veccham, Srimukh Prasad, Venanzi, Niccolò Alberto Elia, Vorontsov, Anton, Wilber, Jared, Wilkinson, Isabel, Wong, Wei Jing, Xue, Eva, Ye, Cory, Yu, Xin, Zhang, Yang, Zhou, Guoqing, Zandstein, Becca, Dallago, Christian, Trentini, Bruno, Kucukbenli, Emine, Paliwal, Saee, Rvachov, Timur, Calleja, Eddie, Israeli, Johnny, Clifford, Harry, Haukioja, Risto, Haemel, Nicholas, Tretina, Kyle, Tadimeti, Neha, Costa, Anthony B

Artificial Intelligence models encoding biology and chemistry are opening new routes to high-throughput and high-quality in-silico drug development. However, their training increasingly relies on computational scale, with recent protein language models (pLM) training on hundreds of graphical processing units (GPUs). We introduce the BioNeMo Framework to facilitate the training of computational biology and chemistry AI models across hundreds of GPUs. Its modular design allows the integration of individual components, such as data loaders, into existing workflows and is open to community contributions. We detail technical features of the BioNeMo Framework through use cases such as pLM pre-training and fine-tuning. On 256 NVIDIA A100s, BioNeMo Framework trains a three billion parameter BERT-based pLM on over one trillion tokens in 4.2 days. The BioNeMo Framework is open-source and free for everyone to use.

bionemo framework, machine learning, natural language, (17 more...)

2411.10548

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(3 more...)

Genre:

Workflow (0.49)
Research Report (0.40)

Industry:

Information Technology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Robroek, Ties, Nielsen, Neil Kim, Tözün, Pınar

TensorSocket: Shared Data Loading for Deep Learning Training

arXiv.org Artificial IntelligenceSep-27-2024

Training deep learning models is a repetitive and resource-intensive process. Data scientists often train several models before landing on set of parameters (e.g., hyper-parameter tuning), model architecture (e.g., neural architecture search), among other things that yields the highest accuracy. The computational efficiency of these training tasks depends highly on how well we can supply the training process with training data. The repetitive nature of these tasks results in the same data processing pipelines running over and over exacerbating the need for and costs of computational resources. In this paper, we present Tensorsocket to reduce the computational needs of deep learning training by enabling simultaneous training processes to share the same data loader. Tensorsocket mitigates CPU-side bottlenecks in cases where the collocated training workloads have high throughput on GPU, but are held back by lower data-loading throughput on CPU. Tensorsocket achieves this by reducing redundant computations across collocated training processes and leveraging modern GPU-GPU interconnects. We demonstrate the hardware- and pipeline-agnostic nature of Tensorsocket and evaluate it using a variety of training scenarios. Our evaluation shows that Tensorsocket enables scenarios that are infeasible without data sharing, increases training throughput by up to $100\%$, and when utilizing cloud instances, Tensorsocket achieves cost savings of $50\%$ by reducing the hardware resource needs on the CPU side. Furthermore, Tensorsocket outperforms the state-of-the-art solutions for shared data loading such as CoorDL and Joader. It is easier to use, maintain, and deploy, and either achieves higher or matches the throughput of other solutions while requiring less CPU resources.

artificial intelligence, deep learning, machine learning, (19 more...)

2409.18749

Country:

North America > United States > New York > New York County > New York City (0.14)
Europe > Austria > Vienna (0.14)
Europe > Denmark > Capital Region > Copenhagen (0.04)
(12 more...)

Genre: Research Report > Promising Solution (0.66)

Industry:

Information Technology > Services (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceJun-21-2023

FFCV: Accelerating Training by Removing Data Bottlenecks

Leclerc, Guillaume, Ilyas, Andrew, Engstrom, Logan, Park, Sung Min, Salman, Hadi, Madry, Aleksander

We present FFCV, a library for easy and fast machine learning model training. FFCV speeds up model training by eliminating (often subtle) data bottlenecks from the training process. In particular, we combine techniques such as an efficient file storage format, caching, data pre-loading, asynchronous data transfer, and just-in-time compilation to (a) make data loading and transfer significantly more efficient, ensuring that GPUs can reach full utilization; and (b) offload as much data processing as possible to the CPU asynchronously, freeing GPU cycles for training. Using FFCV, we train ResNet-18 and ResNet-50 on the ImageNet dataset with competitive tradeoff between accuracy and training time. For example, we are able to train an ImageNet ResNet-50 model to 75\% in only 20 mins on a single machine. We demonstrate FFCV's performance, ease-of-use, extensibility, and ability to adapt to resource constraints through several case studies. Detailed installation instructions, documentation, and Slack support channel are available at https://ffcv.io/ .

artificial intelligence, ffcv, machine learning, (17 more...)

2306.12517

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)

Genre: Research Report (0.64)

Industry: Information Technology (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.73)

#artificialintelligenceMar-25-2023, 17:40:25 GMT

Introducing HuggingFace Accelerate

Hugging Face Accelerate is a library for simplifying and accelerating the training and inference of deep learning models. It provides an easy-to-use API that abstracts away much of the low-level details of distributed training and mixed-precision training. In this article, we'll introduce the key concepts and features of Hugging Face Accelerate. Here's an example of how to use Hugging Face Accelerate to train a deep learning model using PyTorch: In this example, we first create an instance of the accelerator using the Accelerator() constructor. We then create an instance of our model and our data loader.

accelerator, hugging face accelerate, huggingface accelerate, (3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

#artificialintelligenceFeb-24-2023, 16:00:28 GMT

Pipeline Parallelism - DeepSpeed

DeepSpeed v0.3 includes new support for pipeline parallelism! Pipeline parallelism improves both the memory and compute efficiency of deep learning training by partitioning the layers of a model into stages that can be processed in parallel. DeepSpeed's training engine provides hybrid data and pipeline parallelism and can be further combined with model parallelism such as Megatron-LM. An illustration of 3D parallelism is shown below. Our latest results demonstrate that this 3D parallelism enables training models with over a trillion parameters.

batch, deepspeed, pipeline parallelism, (14 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

#artificialintelligenceFeb-16-2023, 17:35:39 GMT

Transforming a Horse to a Zebra Using A Generative Adversarial Network (GAN)

When two deep learning models work together in the style of a zero-sum game, one agent's gain is another agent's loss. This is known as a generative adversarial network (GAN). With a training set as its starting point, this approach learns to generate new data with the same statistics as the training set. A GAN can create new images with realistic elements that look a lot like the originals. GAN are popular because they give accurate results.

cyclegan, gan, generative adversarial network, (7 more...)

Genre: Instructional Material (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.58)

#artificialintelligenceJan-30-2023, 01:20:19 GMT

A QuickStart to Pytorch Tutorial. In machine learning, there is a…

In machine learning, there is a workflow that involves working with data, creating models, optimizing model parameters, and saving training models. These frameworks produced the best practices for implementing a deep learning framework. With the help of Pytorch, an ML framework based on the Torch library, tensors allow for multidimensional rectangular arrays to operate on CUBA-capable Nvidia GPUs. In this tutorial basics, we will go into the FashioMINST dataset to train neural networks that the product of an input image belongs in a class. Now that you have the correct installations and Pytorch installed, you can now start the tutorial process.

artificial intelligence, machine learning, pytorch, (17 more...)

Country: North America > Cuba (0.25)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)